State-of-the-Art in Weighted Finite-State Spell-Checking
نویسندگان
چکیده
The following claims can bemade about finite-statemethods for spell-checking: 1) Finite-state language models provide support for morphologically complex languages that word lists, affix stripping and similar approaches do not provide; 2) Weighted finite-state models have expressive power equal to other, state-of-the-art string algorithms used by contemporary spell-checkers; and 3) Finite-state models are at least as fast as other string algorithms for lookup and error correction. In this article, we use some contemporary non-finite-state spell-checking methods as a baseline and perform tests in light of the claims, to evaluate state-of-the-art finitestate spell-checking methods. We verify that finite-state spell-checking systems outperform the traditional approaches for English. We also show that the models for morphologically complex languages can be made to perform on par with
منابع مشابه
Creating and Weighting Hunspell Dictionaries as Finite-State Automata
There are numerous formats for writing spell-checkers for open-source systems and there are many lexical descriptions for natural languages written in these formats. In this paper, we demonstrate a method for converting Hunspell and related spell-checking lexicons into finite-state automata. We also present a simple way to apply unigram corpus training in order to improve the spellchecking sugg...
متن کاملFinite-State Spell-Checking with Weighted Language and Error Models—Building and Evaluating Spell-Checkers with Wikipedia as Corpus
In this paper we present simple methods for construction and evaluation of finite-state spell-checking tools using an existing finite-state lexical automaton, freely available finite-state tools and Internet corpora acquired from projects such as Wikipedia. As an example, we use a freely available open-source implementation of Finnish morphology, made with traditional finite-state morphology to...
متن کاملProcessing Mutations in Breton with Finite-State Transducers
One characteristic feature of Celtic languages is mutation, i.e. the fact that the initial consonant of words may change according to the context. We provide a quick description of this linguistic phenomenon for Breton along with a formalization using finite state transducers. This approach allows an exact and compact description of mutations. The result can be used in various contexts, especia...
متن کاملAnalysis of the Spell of Rainy Days in Lake Urmia Basin using Markov Chain Model
In this study, the Frequency and the spell of rainy days was analyzed in Lake Uremia Basin using Markov chain model. For this purpose, the daily precipitation data of 7 synoptic stations in Lake Uremia basin were used for the period 1995- 2014. The daily precipitation data at each station were classified into the wet and dry state and the fitness of first order Markov chain on data series was e...
متن کاملCompiling Apertium morphological dictionaries with HFST and using them in HFST applications
In this paper we aim to improve interoperability and re-usability of the morphological dictionaries of Apertium machine translation system by formulating a generic finite-state compilation formula that is implemented in HFST finite-state system to compile Apertium dictionaries into general purpose finite-state automata. We demonstrate the use of the resulting automaton in FST-based spell-checki...
متن کامل